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CN , Abstract 

' In this paper, we consider the problem of obtaining the best fc-sparse 

solution of Ax = y subject to the constraint that the columns of A are 
^t ' orthogonal. The naive approach for obtaining a solution to this problem 

l^ , has exponential complexity and there exist h regularization methods such 

as Lasso to obtain approximate solutions. In this paper, we show that we 
can obtain an exact solution to the problem, with much less computational 
effort compared to the brute force search when the columns of A are 
orthogonal. 

1 Introduction 



C^ . We consider the following problem: 



Problem 1 (Pi). Find u^ e M" such that 

\\Au^ - y\\ = ini{\\Ax - y\\ \ x G R", ||x||o = k}. 

Here, the dimensions of u'', A, y, x are nxl,inxn,mxl,nxl respectively, with 
m > n. Obtaining sparse solutions to overdetermined system of equations has 
a long history in the statistics community. For example, the Lasso algorithm 
due to Tibshirani ( [Tibshirani (1996)] ) tries to solve the following problem (for 
a fixed A):- 

find X such that \\Ax — y\\2 + A||a;||i is minimized 



li norm is employed here rather than the Iq as using li penaUzation makes 
the problem tractable (convex optimization methods can be used to solve it) 
and also due to the fact that minimizing li norm typically provides sparse solu- 
tions. The minimizer to the Lasso is obtained by solving a series of Quadratic 
Progrmmaing problems ( [Tibshirani (1996)]). Other computational techniques 



also exist in the literature for solving the Lasso, e.g. see [Osborne et al. (2000a) 
Osborne et al.(2000b)| 



The Lasso method is general and is applicable to any matrix. However, the 
parameter A in the unconstrained formulation of the problem has to be tuned 
to obtain satisfactory results. A is usually obtained by cross-validation. 

Sparse solutions to overdetermined system are also considered in the paper 
by Candes et al.(2005)] . The authors study the problem of reconstructing x 



exactly when the observed data are corrupted by noise. If 

y = Ax + e 

the authors give conditions on the matrix A and an minimizing algorithm which 
recovers x exactly subject to a constraint on the number of non-zero entries of e. 
Howerver, this work is not directly relevant to the problem under consideration. 
In this paper, we give an explicit solution to Pi, under the constraint that 
the columns of A are orthogonal. We show that the solution given is equivalent 
(i.e. has equal error) to any solution obtained by a brute force search. One 
advantage of the method over the Lasso is that no tuning is necessary. As the 
proposed method still involves computing inverse of A^ A, it might not scale 
well to problems where is n is very large. 

2 Equivalence of solutions 

We first fix some notation. If cc is a vector let x^ denote the element wise square 
of X. Let, z — {A^A)^^{A'^y)'^ and Zs be the result of sorting z (in a stable 
manner) in the decreasing order. Let / be a permutation such that f{i) = j 
implies that Zs{i) = z{j), where x{i) denotes the i"^ element of the vector x. 
Define, Xpi = A''y and finally, 

«'=(/-!(*)) = xpi{f-\i)) iU < k 

v^{r^{i)) = 0, if z>fc 

Note here that v'^ can be computed with much less computational effort than 
the brute force search as the inverse {A^ A)^"^ is only computed once. 

Proposition 2.1. Let A he an mx n matrix such that m > n, n > 1. Assume 
that A has full column rank. If the columns of A are orthogonal, then 

WAu'' ~ y\\ = WAv'^ - y|| Vy G M™, Vfc e {1, ..., n} 

Proof. As u^ is a solution to PI, its non-zero elements should be of the form 
u = {AC^y where C is a n x fc column picking matrix. Similarly, the non-zero 



elements of v'' can be written as u = RA^y where R is a k x n row picking 
matrix. We then have \\Au'^ — y\\ = \\ACu — y\\ and \\Av'' — y\\ = \\AR'^v — y\\. 
Therefore, 



(ACu - yY {ACu - y) 

= u^C^A^ACu - 2y^ACu + y^y 

= y'^{{ACYfC^A'AC{ACYy - 2y'^AC{ACYy + y^y 

= -y'^AC{ACYy + y^y 



(1) 



{AR^v - y)^{AR^v - y) 

=v'^RA^AR^v - 2y'^AR^v + y'^y (2) 

^y'^{A'^fR^RA^AR^RA^y - 2y^AR^RAUj + y^y 

Now, we prove that if the columns of A are assumed to be orthogonal i.e. if A^ A 
is a diagonal matrix then {ACu — y)'^{ACu — y) = {AR^v — y)^{AR^v — y). 
To show this, we first try to find the C which minimizes {ACu — y)'^ {ACu — y). 
As has been shown above, this is equivalent to minimizing 

-y'^AC{ACyy + y^y 

or maximizing 

{A^yfC{C^A''AC)-'C^{A^y) (3) 

As A^A is diagonal, C{C^A'^AC)'^C'^ = CC'^{A'^A)-^ and hence, 

{A^yfC{C^A^ACr'C^{A^y)= Y. ^^ (4) 

where, A^'s are the diagonal elements of A^A (note that the Ai's are strictly pos- 
itive real numbers). The maximum possible value of X^ieiiKcc^) t^O ^ ^ is 

'^i£fi\(R'^R) ■ ^0) \^ ^^ ^ picks the maximum fc-componcnts of {A A)^^{A y 
So without loss of generality we can assume that C ~ R^ as the error IJACu — j/|| 
cannot be minimised any further. 

The i^^ row of the matrix R^ RA^ A equals the i^^ row of A^ A if Ri^ = 1 
and equals the zero row otherwise. Therefore, 

{A^A)-'R^R{A^A)R^R{A'^A)-' - 2R^R{A^A)-' = -R^R{A^Ay' 

From C = RF wc finally get 

{A^ A)-^ R^ R{A^ A)R^ R{A^ A)-^ +C{C'^ {A^ A)C)-'^C'^ -2R^ R{A^ A)-^ = 

D 



The above analysis raises the following question: can we say anything in the 
reverse? Supposing we are given that \\Av'' — y\\ = \\Au^ — y\\ for all y, then 
is it true that the columns of A are orthogonal? We show below that this is 
indeed true for the case of fc = 1. For the proof of this fact we need the following 
supporting lemma. 

Lemma 2.1. Let A be an m x n matrix such that m > n, n > 1. Assume 
that A has full column rank. If the diagonal entries of {A'^ A)^^ are inverses of 
the diagonal entries of A^ A (i.e., if{A'^A)~^ {A'^A)ii = 1) then the off-diagonal 
elements of A A (and hence the off-diagonal elements of {A A)~^ ) are all equal 
to zero. 



Proof. First the claim is proved for n ^ 2 and the general case is proved by 

7 



induction. For the case n = 2 assume that A^A — [ a ] a-nd (A^Ay 



a '^ 

(3' i 



From, 



?:)-(if)^(;; 

we get that [3/3' — and a/3' + ^ = 0, which implies that /? = /?'= proving 
the claim for this case. Now, we assume that the proposition is true for n — 1. 
Let 

where Sn is an n — 1 x n — 1 matrix, S22 is a scalar, Ei2isa?i— 1x1 vector and 
E21 is a 1 X n— 1 vector. It is important to note here that A"'^ A and Sn are sym- 
metric positive definite matrices. Therefore, their inverses [A"^ A)^^ and Yi^i are 
also symmetric and positive definite ([Harville(2008) , Corollary 14.2.11). Using 



blockwise matrix inversion, we can write {A'- A) '"as (see [Bernstein(2005)| , p. 

45) 

^11 + ^11 ^12(5^22 — S2lSj^j^ S12) E2lS]^]^ —S-^j^ £12(^22 — S2lS]^]^ Si 



-(S22 — S2lS]^]^ S12) Si2S]^]^ (S22 — S2lSj^j^ S 



12 J 



Here, (i;22 — S2iSii^Si2) is the schur complement of En in AC^ A and is positive 
definite as A^ A and En are both positive definite (see, Boyd fc Vandenberghe(2004)|, 



Appendix A. 5. 5). Hence, the above blockwise matrix inversion formula is valid. 
Now, from 

(E22 — E2iEj^j^ E12) * S22 = 1 

we get that E2iEjj^ E12 ~ £^2^0. ^12 = 0- As T,'^-^ is positive definite, we 
obtain that E12 = 0. Therefore, 



{A^AY 



Sn' 



-1 



E22 



From the induction hypothesis we know that Yi^^ is a diagonal matrix. There- 
fore, [A^ A)^^ is diagonal and the induction step is proved. D 



Proposition 2.2. Let A be an m x n matrix such that m > n, n > I. Assume 
that A has full colum,n rank. Then, 

\\Au^ -y\\ = \\Av^ -y\\ Vj/ G R" 

if and only if the columns of A are orthogonal. 

Proof. The forward implication has aheady been proved previously and to prove 
the reverse implication, we show the existence of a few y's so that if expression 
if \\Au^ ~ y\\ = \\Av^ ~ y\\ for all these choices of y then A^A is diagonal. We 
first choose y = A{A'^A)^^[1,0,0, ■ ■ ■ ,0]'^ and look for solutions u^ and v^. 
For the above choice of y expression ([3|) reduces to:- 



[1, 0, 0, • • • , 0]C(C" A' ACy'C" [1, 0, 0, • ■ • , 0]' 

It is easy to see that the C which maximizes the above expression is C = 
[1, 0, 0, • • • , 0]"^ as for any other choice of C the expression equals zero. 
Now, we shall show that 



{A^yfiiA^A)-'R^R{A'^A)R^RiA'^A)-' - 2R^R{A'^Ar' 

+C{C^{A'^A)C)-'C^){A'^y) 



(5) 



/ Pll Pl2 
P21 P22 



Pin 

P27, 



,0]. For this, let 




/ Jii 


jl2 ■ ■ ■ 


Jln \ 




J21 


J22 ■ ■ ■ 


J2n 




\ jnl 


jn2 ■ ■ ■ 


Jnn / 



can equal zero only for the choice R = [1, 0, 0, • • 

\ 

A^A= ':' ":' '7' ,iA^A)-' = 

\ Pnl Pn2 ■ ■ ■ Pnn J 

When R = [1,0,0, •• • ,0] we get that the value of expression ([5]) is equal to 
PiiJii ^ 2^11 + '^- This value can be made zero by choosing ju = -^. For any 
other choice of R such that Ri = 1, i ^ 1, the value of expression ([S|) is equal 
to PiiJii + -^. This value cannot be made zero by any choice of ju as pn > 0. 
As expression ([5]) has to equal zero from our initial assumption, we are forced 
to choose 7ii = -^. 

Now, we choose y = A(A^A)~i[0, 1,0, •• • ,0]"^ and obtain that J22 = ^• 
By continuing in this fashion, we get that ju = — V« G {1, 2, 3, • • • , n}. Finally, 
we apply Lemma ETT] and get that both A'^A and (A^A)"^ are diagonal. D 
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